This analysis investigates county-level shifts in US Presidential elections between 2020 and 2024, using geographic data and election results to visualize and analyze political trends.
Introduction
Understanding political shifts at the granular level of US counties provides valuable insights into the evolving political landscape of the nation. This project investigates the changes in voting patterns between the 2020 and 2024 US Presidential Elections, using county-level data to explore how political allegiance may have shifted geographically. By examining both spatial and statistical trends, we aim to identify regions where significant changes occurred, explore potential underlying causes, and provide visual evidence of political realignments.
The analysis incorporates geographic shapefiles for US counties and election result data extracted directly from Wikipedia. Leveraging the power of the R programming language and libraries such as sf, leaflet, tidyverse, and rvest, we retrieve, clean, and merge datasets to create a comprehensive view of electoral behavior across states. The use of interactive mapping allows for accessible visualization of complex data, supporting both qualitative and quantitative interpretations.
Ultimately, this project offers a data-driven perspective on how voter preferences may be shifting across the country. These insights can inform political strategists, researchers, and citizens interested in the dynamics of American democracy at the local level.
library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.1.4 ✔ readr 2.1.5
✔ forcats 1.0.0 ✔ stringr 1.5.1
✔ ggplot2 3.5.2 ✔ tibble 3.2.1
✔ lubridate 1.9.4 ✔ tidyr 1.3.1
✔ purrr 1.0.4
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(httr2)library(rvest)
Attaching package: 'rvest'
The following object is masked from 'package:readr':
guess_encoding
library(sf)
Linking to GEOS 3.13.0, GDAL 3.10.1, PROJ 9.5.1; sf_use_s2() is TRUE
This chunk downloads and reads the shapefile containing US county boundaries. It first checks if a directory (data/mp04) exists and creates it if needed. It then defines the URL of the shapefile (from the US Census Bureau) and downloads the ZIP file if it’s not already present. Once downloaded, the ZIP is extracted, and the shapefile is read using the sf package into an object called county_shapes. Finally, glimpse() provides a quick overview of the dataset’s structure, helping us understand the attributes and geometry data associated with each county.
Task 2: Acquire 2024 US Presidential Election Results
The function returns the cleaned and merged dataset combined_elections.
3. Prepare County Shapefiles for Visualization
Function: prepare_shapes()
Join Shapefile Data with Election Data:
Create FIPS Code for Shapefile: Similar to the election data, STATE_NAME and COUNTY_NAME are used to create a unique FIPS code for each county.
Join Election Data: Merges county_shapes (geospatial data) with combined_elections based on the FIPS code.
Reposition Alaska and Hawaii:
Shifting Locations for Visualization: Alaska and Hawaii are repositioned to avoid overlap and improve map readability.
Transform to Web Mercator CRS: The data is transformed into EPSG:3857 (Web Mercator), which is suitable for visualization.
Shifting Alaska: Moves Alaska down and right using a specific transformation applied to its geometry.
Shifting Hawaii: Moves Hawaii right using a similar transformation.
The function returns the adjusted shapefile data county_data_transformed.
4. Process and Preview the Data
Preview Combined Election Data: head(combined_elections) shows the first few rows of the cleaned election data.
Preview Shapefile Data: head(county_data) shows the first few rows of the combined county shapefile and election data.
This process ensures that both the election data and geographic data are cleaned, merged, and ready for analysis or visualization in further tasks.
Initial Analysis
Task 4: Initial Analysis Questions
# 1. Which county or counties cast the most votes for Trump (in absolute terms) in 2024?most_trump_votes <- combined_elections |>arrange(desc(Trump_2024)) |>head(5) |>select(County, State, Trump_2024)# 2. Which county or counties cast the most votes for Biden (as a fraction of total votes cast) in 2020?most_biden_pct <- combined_elections |>arrange(desc(Biden_pct)) |>head(5) |>select(County, State, Biden_pct)# 3. Which county or counties had the largest shift towards Trump (in absolute terms) in 2024?largest_trump_shift_abs <- combined_elections |>arrange(desc(Trump_shift_abs)) |>head(5) |>select(County, State, Trump_shift_abs)# 4. Which state had the largest shift towards Harris (or smallest shift towards Trump) in 2024?state_shifts <- combined_elections |>group_by(State) |>summarize(Total_Trump_2020 =sum(Trump_2020, na.rm =TRUE),Total_Trump_2024 =sum(Trump_2024, na.rm =TRUE),Total_Biden =sum(Biden, na.rm =TRUE),Total_Harris =sum(Harris, na.rm =TRUE),Total_Votes_2020 =sum(Total_2020, na.rm =TRUE),Total_Votes_2024 =sum(Total_2024, na.rm =TRUE) ) |>mutate(Trump_pct_2020 = Total_Trump_2020 / Total_Votes_2020,Trump_pct_2024 = Total_Trump_2024 / Total_Votes_2024,State_Trump_Shift = Trump_pct_2024 - Trump_pct_2020 ) |>arrange(State_Trump_Shift) |>head(5)# 5. What is the largest county, by area, in this data set?largest_county <- county_data |>mutate(area_sq_km =as.numeric(st_area(geometry)) /1e6) |>arrange(desc(area_sq_km)) |>head(5) |>select(COUNTY_NAME, STATE_NAME, area_sq_km)# 6. Which county had the largest increase in voter turnout in 2024?turnout_increase <- combined_elections |>arrange(desc(Turnout_change)) |>head(5) |>select(County, State, Total_2020, Total_2024, Turnout_change)
Results of Initial Analysis
1. Counties with the most votes for Trump in 2024
most_trump_votes |> knitr::kable(digits =0)
County
State
Trump_2024
LOS ANGELES
CALIFORNIA
1189862
MARICOPA
ARIZONA
1051531
HARRIS
TEXAS
722695
ORANGE
CALIFORNIA
654815
MIAMI-DADE
FLORIDA
605590
1. Counties with the Most Votes for Trump in 2024:
This analysis focuses on counties where Donald Trump received the largest absolute number of votes in the 2024 presidential election.
Why it matters: Knowing which counties cast the most votes for Trump helps us understand his support base, especially in critical swing states or traditionally Republican areas. These counties may represent areas with a higher population or a strong historical alignment with the Republican Party. Large urban centers or rural conservative regions might play a key role in his overall vote count.
Insights: It highlights the geographic distribution of Trump’s voter base and can indicate trends in voter behavior, such as increasing turnout in conservative areas or changing demographics in traditionally Republican counties.
2. Counties with the highest percentage of votes for Biden in 2020
most_biden_pct |> knitr::kable(digits =3)
County
State
Biden_pct
KALAWAO
HAWAII
0.958
PRINCE GEORGE’S
MARYLAND
0.911
OGLALA LAKOTA
SOUTH DAKOTA
0.905
BALTIMORE CITY
MARYLAND
0.891
PETERSBURG
VIRGINIA
0.887
2. Counties with the Highest Percentage of Votes for Biden in 2020:
This analysis focuses on the counties where Joe Biden received the highest percentage of the total votes in the 2020 presidential election.
Why it matters: By identifying counties with a high percentage of votes for Biden, this helps to pinpoint Democratic strongholds, often in urban or liberal areas. It may also highlight areas that are more progressive, with a higher proportion of voters from minority communities or younger voters.
Insights: It gives us an understanding of where Biden performed best in 2020 and which counties are likely to be heavily contested in future elections, particularly where there is a strong Democratic presence. This can also inform campaign strategies targeting Democratic voters.
3. Counties with the largest absolute shift towards Trump in 2024
3. Counties with the Largest Absolute Shift Toward Trump in 2024:
This looks at counties that experienced the largest increase in Trump’s vote share from 2020 to 2024 in absolute terms.
Why it matters: This analysis helps to understand where Trump gained more support over the years, and where his messaging or policies resonated more strongly in 2024 compared to 2020. These shifts might be seen in suburban or rural areas where Trump’s message resonated more in the second election.
Insights: Identifying the counties where Trump saw the largest increase provides valuable information on voter behavior changes, shifts in party allegiance, or the effectiveness of his campaign’s efforts. This can reflect changing economic, social, or political conditions in these regions.
4. States with the smallest shift towards Trump (or largest towards Harris)
4. States with the Smallest Shift Toward Trump (or Largest Toward Harris):
This identifies states where there was either the smallest shift toward Trump or the largest shift toward Harris in the 2024 election compared to 2020.
Why it matters: By analyzing state-level shifts, we can see if Trump’s appeal waned in certain regions or if Kamala Harris gained more support in key areas. In some cases, Harris’ presence on the ticket could have drawn more votes to the Democratic side, especially among women, minority communities, or younger voters.
Insights: This helps identify where political polarization might be softening or intensifying. It also gives insight into areas where political campaigns might focus on turning out voters who might be disillusioned with one of the candidates. Understanding these shifts can help predict how these states might vote in future elections.
5. Largest counties by area
largest_county |> knitr::kable(digits =2)
COUNTY_NAME
STATE_NAME
area_sq_km
geometry
YUKON-KOYUKUK
DISTRICT OF COLUMBIA
2229499.9
MULTIPOLYGON (((-17927801 8…
NORTH SLOPE
DISTRICT OF COLUMBIA
1865109.8
MULTIPOLYGON (((-16393475 1…
NORTHWEST ARCTIC
DISTRICT OF COLUMBIA
616764.8
MULTIPOLYGON (((-17931599 9…
BETHEL
DISTRICT OF COLUMBIA
463851.6
MULTIPOLYGON (((-17997103 8…
NOME
DISTRICT OF COLUMBIA
340500.2
MULTIPOLYGON (((-17958000 9…
5. Largest Counties by Area:
This analysis examines the counties with the largest physical size, measured in square kilometers.
Why it matters: The size of a county often has little to do with its political influence, but it can reflect population density and urban-rural dynamics. Larger counties in terms of area tend to have lower population densities but can have significant political influence in areas with growing or shifting populations.
Insights: Large counties are often found in rural areas, such as in western states like Texas, Alaska, and California. These counties might have lower voter turnout due to their large geographical area, which can make them harder to reach for campaigns. However, as rural areas grow, they could become more politically significant.
6. Counties with largest increase in voter turnout
turnout_increase |> knitr::kable(digits =0)
County
State
Total_2020
Total_2024
Turnout_change
CLARK
NEVADA
952782
1013239
60457
MONTGOMERY
TEXAS
267759
304241
36482
DENTON
TEXAS
411175
442024
30849
HORRY
SOUTH CAROLINA
178001
204044
26043
PINAL
ARIZONA
182183
207582
25399
6. Counties with the Largest Increase in Voter Turnout in 2024:
This analysis focuses on counties that saw the largest increase in voter turnout from 2020 to 2024.
Why it matters: Increased voter turnout can indicate growing voter engagement, possibly due to strong political campaigns, shifting demographics, or effective outreach strategies. This also suggests areas where voters felt their vote mattered or were energized by specific candidates or issues.
Insights: Tracking turnout increases can reveal changing political engagement, showing which counties may have been motivated by issues such as the economy, healthcare, or social justice. It also sheds light on successful voter mobilization efforts that could shape future elections.
Visualization
Task 5: Reproduce NYT Figure
map_data <- county_shapes |>mutate(STATE_NAME = state.name[match(as.numeric(STATEFP), state.abb)],STATE_NAME =ifelse(is.na(STATE_NAME), "District of Columbia", STATE_NAME),COUNTY_NAME =toupper(NAME),FIPS =paste0(STATE_NAME, "_", COUNTY_NAME) ) |>left_join(combined_elections, by ="FIPS")print(paste("Counties with shift data after join:", sum(!is.na(map_data$Trump_shift_pct))))